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1*  INTRODUCTION 

In  my  previous  work  on  non-linear  filtering  for  diffusion  processes  [Mitter 
1980,1983]  I  have  discussed  the  analogies  that  exist  between  these  problems  and 
problems  of  quantum  physics  from  the  Feynman  point  of  view  £cf.  Glimm-Jaffe  1981]. 
The  basic  idea  here  is  that',  construction  of  a  non-linear  filter  involvea  an 
integration  over  function  space  which  is  exactly  analogous  to  the  construction  of  a 
measure  on  path-space  via  the  Feynman-Kac-Nelson  Formula.  Accord ing  to  th4*<— 

viewpoint,  the,  Kalman-Bucy  filtering  problem,  aamcir^ the  filtering  of  Gauss-Markov 
processes  in  the  presence  of  additive  white  Gaussian  noise  occupies  the  same  role  as 
the  Ornstein-Uhlenbeck  process  (finite  or  infinite-dimensional)  in  Quantum  Mechanics 
or  Quantum  Field  Theory.  That  this  analogy  may  be-  docpc^  is  borne  out  by  the  fact 
that  a  solvable  Lie  algebra,  the  oscillator  algebra  which  contains  the  Heisenberg 
algebra  as  a  derived  algebra  is  intrinsically  attached  to  the  Kalman-Bucy  Filtering 
problem.  I  have  also  shown  th at,  the  problem  of  non-linear  filtering  of  diffusion 
processes^admiti^ a  stochastic  variational  interpretation  [FI cming-Mrttsfir~lr93Zfr  The 
objective  of  this  paper  is  to  strengthen  these  analogies  further  with  a  view  to 
showing  the  close  relationship  of  estimation  theory  to  statistical  mechanics.  The 
motivation  for  this  comes  from  problems  of  estimation  and  inverse  problems  related 
to  image  processing. 

In  order  to  carry  out  this  program  it  is  necessary  to  generalize  these  ideas  to 
filtering  problems  for  infinite-dimensional  processes^ jwhere  we  are  forced  to  work  in 
the  context  of  generalized  random  fields.  -  There  are  two  types  of  processes 
involved:  continuous  processes  such  as  infinite-dimensional  Ornstein-Uhlenbeck 

processes  and  their  L2-functionals  which  represent  intensities  of  images  and 
processes  of  a  "discrete*-nature  which  will  represent  'boundaries''  of  images.  The 
most  interesting  models  are  obtained  when  these  processes  are  coupled  according  to  a 
probabilistic  mechanisms.  The  'discrete^processes  should  be  thought  of  as  gauge 
fields  and  will  be  a  process  on  connection  forms. 

Although  the  estimation  problems  of  interest  are  naturally  viewed  in  the  context 
of  random  fields  which  are  independent  of  time,  the  best  way  to  obtain  sample 
functions  for  these  fields  is  to  simulate  it  via  finite  or  inf inite- dimensional 
stochastic  differential  equations  whose  invariant  distribution  coincides  with  the 


distribution  of  the  time-independent  random  field.  This  Monte-Carlo  simulation 
procedure  is  the  same  idea  as  stochastic  quantization,  an  idea  advanced  by  Parisi 
(cf.  Parisi-Wu  1981)  and  recently  studied  in  a  rigorous  manner  for  (Pp>2  fields  in  a 
finite  volume  by  Jona-Lasinio  and  Mitter,  1984.  The  problem  which  are  of  interest 
here  are  filtering  problems  associated  with  these  stochastic  fields  obtained  by 
introducing  observations  which  are  "local*  and  studying  the  behaviour  of  these 
filters  as  t  ->®.  To  make  any  progress  however,  one  would  have  to  work  with  lattice 
approximations  of  these  stochastic  fields  and  reduce  the  filtering  problems  to  a 
finite  dimensional  situation  and  even  here  there  are  severe  technical  difficulties. 
When  the  observations  are  however  local  and  considered  to  be  on  the  stationary  Gibbs 
field  the  problems  amounts  to  looking  at  the  invariant  distribution  of  a  stochastic 
differential  equation  with  a  coupling  coming  from  the  observations. 

The  main  objective  of  this  paper  is  to  explore  these  relationships  between  problems 
of  estimation  and  stochastic  quantization  (stochastic  mechanics)  at  a  conceptual 
level.  The  detailed  technical  discussion  will  appear  elsewhere. 

2.  SIGNAL  MODELS  FOR  IMAGE  PROCESSING 

In  order  to  treat  problems  of  Image  Processing  in  a  probabilistic  framework  we  need 
probabilistic  models  for  the  signals  in  question.  These  signals  are  various 
attributes  of  images  such  as  intensity  of  images  and  boundaries  between  smooth  parts 
of  images.  The  probabilistic  models  we  choose  are  Gibbsian  random  fields  and  are 
borrowed  from  statistical  mechanics.  These  models  for  image  processing  have  been 
used  by  several  authors  recently  (c.f.  Geman  and  Geman  1984,  Grenander  1984, 
Marroquin  1985).  The  exposition  of  signals  models  given  below  follows  Sinai  1982. 
The  signal  models  we  wish  to  consider  correspond  to  statistical  mechanical  models  on 
a  finite  lattice  and  we  shall  take  this  lattice  to  be  Z  with  the  Euclidean  norm. 
The  sample  space  Q  consists  of  functions  p:Z  ->  I :  x=(xj  ,Xj)  -»0(x),  where  f  is  a 
finite  set,  a  homogeneous  space  of  a  compact  Lie  Group  with  the  natural  o-algebra  of 
Borel  Sets  or  R* .  The  sample  space  Q  is  termed  a  configuration  space  in  statistical 
physics.  For  VC  Z2 ,  a  finite  subset,  we  denote  by  p(V)  =  (j}(x)|xeV)  and 
VC  Z2  =  {0<V)|VCZ2}.  finite. 

2 

For  a  non-empty  finite  subset  VC  Z  we  are  given  a  function  I:Q(V)  — >  R:0(V)  — ) 
I(0(V))  which  is  called  the  potential.  I(p(V)  is  the  joint  interation  energy  of 
0(x)  inside  the  domain  V. 

For  an  arbitrary  finite  WCZ  ,  we  define  the  energy  H(^(w))  of  the  configuration  p 
in  the  domain  W  as 


(2.1) 


H(#(w))  -  ^  !<#<▼>>• 
vCw 

The  sum 

H(*(w)  |p(Z2\W))  =■  ^  I(*(V))  (2.2) 

vnw=* 

vnz2\w=* 

2  2 

is  the  interaction  energy  between  the  configuration  0(W)  and  p(Z \W) ,  where  p(Z \W) 
is  the  boundary  condition.  The  total  energy  of  the  configuration  p(W)  is  the  sua 
B(p(W))  +  H(p(Z2\W)). 

The  Haailtonian  is  defined  as  the  formal  series. 

H(#)  =*  J  I(*(V)).  (2.3) 

V 

1 

where  V  ranges  over  all  finite  subsets  of  Z  .  The  model  of  most  interest  to  us  is 
the  2-dimensional  Ising  model  when  I  ■  (1,  -1}  (spins),  and  Hp(V))  *  0 
unless  V  «=  (x.yj  with  1 1 x— y  1 1  *  1. 

We  take  I(p(V))  =  J*(x)*(y)  where  J  =  +  1. 

B(p)  =  -J  J  p(z)p(y)  (2.4) 

(x.y) 

ll*-y|K 

This  Hamiltonian  is  translation  invariant  and  reflection  invariant.  If  J*+l,  the 
model  is  ferromagnetic  and  J*-l  corresponds  to  an  antiferromagnetic  model. 

The  other  model  which  will  be  of  interest  to  us  is  the  Ising  model  with  an  external 
magnetic  field  with  Hamiltonian 

H (p)  *  -J  ^  p(z)p(y)  -  h  ^  p(z) .  (2.5) 

(x.y)  xeZ2 

I  |x-y  I H1 

where  h  may  be  random. 


Finally  a  case  of  interest  to  us  is  the  Hamiltonian 


(2.6) 


H(0)  *  ^  J(x,y)p(x)0(y)  -  &  ^  0(x) 

(x,y)  xeZ2 

1  |x-y | l-i 

where  J(x.y)  is  random.  This  corresponds  to  a  spin  glass. 

Let  a  measure  be  given  on  i  and  for  every  VC  Tr  we  consider  the  product  measure 


TTdX(*<s 


))  *  dp. 


We  are  interested  in  Gibbs  distributions  in  the  domain  V  which  is  a  probability 
distribution  on  Q(V)  whose  density  with  respect  to  dp  is  given  by 


(2.7) 


exp(-H(p( V) ) ) 

J  exp(-H(p(V)  )dp 

Z  =  /  exp(-H(p( v) )dp  is  called  the  partition  function. 


This  corresponds  to  the  so-called  Gibbs  distributions  in  V  with  free  boundary 
conditions  (Dirichlet  Boundary  conditions  in  the  literature  of  quantum  field 
theory).  The  signals  p  we  shall  be  interested  in  will  have  Gibbs  distributions 
given  by  (2.7)  and  is  defined  by  prescribing  the  potential  I. 

For  modelling  intensities  of  images  we  shall  typically  use  a  Gibbs  distribution 
corresponding  to  a  Hamiltonian  of  the  2-dimensional  Ising  model.  Boundaries  between 
smooth  patches  of  images  will  be  modelled  as  Gibbs  distributions  on  the  dual 
lattice.  We  shall  discuss  this  in  a  later  section. 

2.1  Simulation  of  the  Gibbs  Distribution 


Sample  function's  of  the  Gibbs  distribution  are  obtained  by  constructing  a  Markov 
chain  whose  states  correspond  to  the  configurations  of  the  Lattice  field  at  time 
points  1,2....  in  such  a  way  that  it  has  a  unique  invariant  measure  as  the  Gibbs 
measure  exp(-H(0( V) ) )dp.  This  chain  clearly  has  to  be  reversible.  Various 
algorithms  for  creating  such  a  chain  are  known  of  which  the  Metropolis  algorithm  is 
the  best  known.  In  practice  one  may  have  to  deal  with  a  very  large  subset  of  a 
lattice  and  the  random  variables  at  the  lattice  points  may  take  values  in  R1 .  In 
this  case  it  may  be  useful  to  make  a  diffusion  approximation  and  simulate  a 
stochastic  differential  equation  with  the  same  properties  as  above.  This  is  the 
analogy  to  stochastic  quantization  we  referred  to  before. 

One  therefore  has  to  study  the  stochastic  differential  equation  of  the  basic  form: 


v  '  t 


d#(t,x)  *  -D  H(0( t ,x) )  +  dw(t,x) 

? 

*(0,x)  -  *Q(x) 


(2.8) 


where  in  general  is  a  generalized  random  field  and  denotes  'functional* 
derivative  with  respect  to  0.  The  questions  of  interest  is  to  construct  the  measure 
in  the  path  space  of  +  and  prove  that  the  stochastic  differential  equation  (2.8)  has 
a  unique  invariant  measure  with  density  (with  respect  to  an  appropriate  measure) 
exp(-H(0)).  The  interest  in  this  model  for  generating  Gibbs  distributions  is  that 
only  Gaussian  random  numbers  need  to  be  generated  and  the  computation  is  amenable  to 
parallel  processing. 


3.  STOCHASTIC  MECHANICS.  STOCHASTIC  QUANTIZATION  AND  SIMULATION  OF  IMAGE 


INTENSITIES 

Ve  start  with  some  well-known  facts  relating  the  Feynman-Kac  Formula  and  the 
Girsanov  Formula.  (Carmona  1979.  Simon  1980.  Mitter  1980). 

Let  us  suppose  that  V:Rn  ->  R,  be  measurable,  bounded  below  and  tends  to  as 
|x  |  ->  ®  and  consider  the  Schrodinger  operator  H  «  -A  +  V  where  A  is  the  n- 
dimensional  Laplacian.  Then  H  defines  a  self-adjoint  operator  on  L^(Ra;  dx)  which 
is  bounded  below  and  the  lower  bound  k  (assumed  to  be  0)  of  the  spectrum  of  H  is  an 
eigenvalue  of  H.  Let  y(x)  be  the  corresponding  eigenfunction  of  H,  the  so-called 
ground  state  and  assume  <p(z)  >  0.  We  normalize  y(x)  i.e.  /jy(x) pdx  *  1.  Define 
the  probability  measure  dp  =  |y(x)  |^dx,  and  the  unitary  operator 

U  :  L2(Rn;  dx)  -»L2(Rn;  dp(x)) 

:  f  ->*“1f. 

If  we  define  the  Dirichlet  form  for  f,  g  e  C“(Ra) 

C 


6(f ,g)  =  -i-  f  Vf(x)  ‘  Vg(x)dx 


then  a  calculation  shows 
6(f.g>  =  (/f.g) 

where  (  ,  denotes  the  scalar  product  in  L^(Ra;dp)  and  /’is  the  diffusion  operator 
(self-adjoint,  positive) 


/  e  =  -  -  AS  +  Vb  .  ve 


(3.1) 


Since  f  satisfies  the  equation 


j  Af  +  V(x)f  »  0.  (3.2) 

in  the  sense  of  distributions  (note  that  we  have  taken  X*0) ,  a  direct  calculation 
shows 

V(x)  -  y  (  |?b(x)  |2  -  Ab(x)).  (3.3) 

Now  using  the  Feynman-Kac  formula  for  (3.2) 

-t 

fix)  *  E^lfixit) )oxp(-  V(x(s) )ds) |x(t)  ■  x] ,  (3.4) 

J0 

where  E"  denotes  expectation  w.r.  to  Wiener  measure,  the  properties  of  V,  equation 
(3.3)  and  the  generalized  Ito-differential  rule  (Meyer  1978),  we  see 


L(t)  ■  f  1 (x(0)f (x( t) ) exp(- [  V(x(x))ds) 

^  A 


=  exp[-f  Vb(x(s)).dx(s)  -  y  [  ]Vb(x(s) )  |2ds] 
J0  L  J0 


is  a  (fl,  . >t,  pw)  martingale,  where  is  the  o-field  generated  by  (x(s)  |0  i  s  S.  t) 
and  p“  denotes  Wiener  measure.  Therefore  the  process  (w(t)|t  1  0 )  defined  by 


w(t)  »  x(t)  -  x(0)  +  [  Vb(x(s))ds  (3.6) 

J0 

is  standard  Brownian  motion  with  respect  to  the  measure  p-^  defined  by 

=  Lit)  (3.7) 

dp 

Hence  x(t)  considered  as  a  stochastic  process  on  (Q,  \S'>)  is  a  weak  solution  of 

the  equation 


dx(t)  =  -Vb(x(t))dt  +  dw(t) 
x(0)  =  x. 


(3.8) 


Indeed,  the  stochastic  process  defined  by  (3.8)  is  a  Feller  process,  which  is 
recurrent  and  has  p  as  its  unique  invariant  measure.  Therefore  with  the  ground 


state  of  s  Schrodinger  operator  we  have  attached  a  diffusion  process  which  is 
ergodic . 


The  converse  procedure  is  of  interest  to  us.  Suppose  we  start  with  the  stochastic 
differential  equation  on  Rn  given  by  (3.8)  where  -Vb ( . )  is  a  singular  drift.  The 
case  of  interest  to  us  is  where  b(.)  is  a  polynomial  which  is  bounded  below.  Now, 
typically  the  Novikov  condition 


E  exp(y  J  |'Vb(x(s) )  |2ds)  <  ® 


(3.9) 


will  fail  for  these  drifts  and  hence  the  Girsanov  functional 


L(t)  =  exp(- 


f  Vb(x(s)).dx(s) 
J0 


although  a  super-martingale,  need  not  satisfy 


(3.10) 


Em  (L( t ) ]  =  1 

a:  i  hence  fail  to  be  a  martingale.  Therefore,  the  method  of  proof  to  show  that 
equation  (3.8)  has  a  weak  solution  using  the  Girsanov  formula  will  not  work.  To 
show,  however  that  (3.8)  has  a  weak  solution  and  the  process  defined  by  (3.8)  is 
recurrent  and  possesses  a  unique  invariant  measure,  we  consider  the  operator 

/’=  -  i  A  +  Vb.V,  (3.11) 

and  transform  it  to 

H  =  -  j  A  +  V(x) ,  (3.12) 


where 


V(x)  =  y  (  |Vb(x)  |2  -  Ab ( x ) ) , 
using  the  Gauge  transformation, 

p(x)  ->exp(b(x) )p(x) .  (3.13) 

Then  using  functional-analytic  arguments  (cf.  Segal  1970),  one  shows  that  the 
—  t  H 

semigroup  e  is  indeed,  a  strongly  continuous,  self-adjoint  contractive  semigroup 

2  -ft] 

on  L^dp)  where  fi  is  an  appropriate  probability  measure,  e  1=1,  is  positivity 
preserving  and  improving  and  1  is  the  unique  ground  state.  It  then  follows  that 


— tH 

e  is  ergodic  with  4  as  its  unique  invariant  measure.  To  carry  out  this  program 
rigorously,  one  would  use  hyper-contractive  estimates  of  Nelson  and  Segal  and  hence 
it  is  natural  to  work  with  the  stochastic  differential  equation: 

dx(  t)  =■  -Vb(x(  t)  )dt  +  d$(t),  (3.14) 

where 

d((t)  =  -A£(t)dt  +  dw( t ) ,  (3.15) 

where  A  is  a  symmetric  nxn  matrix.  Therefore  (3.15)  defines  a  generalized  Omstein- 
Uhlenbeck  process.  Let  us  define  formally  the  semigroup, 

(e"tLf)(x)  *  EW[f(x(t))L(t)R(t) |x(t)  =  x] ,  (3.16) 

where 


L(t)  =  exp(-[  Vb(x(x) ) .dx( s)  -  y  [  |Vb(x(s)) |^ds) 

J0  J0 

R(t)  a  exp(- f  Ax(s).ds(s)  -  y  [  |Ax(s)  |2ds) 

Jo  L  ■'o 

and  E“  denotes  expectation  with  respect  to  Wiener  measure.  One  wants  to  write 
(3.16)  in  Feynman-Kac  form,  i.e.  in  the  form 

(e"tLf)(x)  =  Ew[f(x(t))exp(-f  V(x(s)ds) jx(t)  =  x]  .  (3.17) 

J0 

To  do  this  we  need  an  Ito  differential  rule  for  b.  For  this  we  need  that  b  is 
cont innons , 

Tb  e  (Rn;  dx)  and  ib  e  l!  (Rn;  dx). 

loc.  loc. 

The  measure  dp  referred  to  previously  then  is 

-t 

dp  =  exp(-  V(x( s) )ds)dp^ . 

J0 


where 


dp 


o> 


R(t) . 


The  rest  then  follows  from  Nelson  1973,  Segal  1970,  Glimm-Jaffe  1981,  Gross  1972. 


These  ideas  can  be  generalized  to  random  fields  (cf.  Jona  Lasinio-Mitter  loc.cit). 
Let  ACR2  be  a  square  and  consider  the  Laplacian  A  on  R2  with  Dirichlet  Boundary 
conditions,  which  is  a  self-adjoint  operator  on  V  =  L2(A).  Let  H2(A)  be  the  Sobolev 

y 

space  of  functions  f  on  R  with  norm 


"'"WJ1- 


A+l)1/2f(x)  2dx 


and  let  V'  *  H”2(A)  be  the  dual-space  of  distributions  with  norm 

|  Ml2,  “  f  |(-A+1>  1/2i»(x)  |2dx. 

H  JA 

Let  (ic  denote  the  Gaussian  measure  on  V*  with  mean  zero  and  covariance  operator  C 
given  by 


C  =  (-A+1)'1. 


(3.19) 


Consider  the  stochastic  differential  equation  on  V' : 


d0(t)  =  -  y  C  %(t)dt  +  dw(  t)  , 


(3.20) 


0(0)  =  0.  0  <  e  <  j 


interpreted  in  the  weak  sense,  where  w(t)  is  a  V'-valued  Brownian  motion  with 
covariance  8  min(t,s)  (defined  using  test  functions). 

One  can  show  that  this  stochastic  process  defines  a  measure  on  the  path  space 
C(0,®;V')  and  we  denote  this  measure  by  P.  Define  the  semigroup 


(e  °f)(0)  =  Etf <0( t) )  |0(t)  =  0] , 


(3.21) 


where  f  is  a  'suitable*  function  and  E  denotes  expectation  w.r.  to  P-measure.  One 
shows  using  the  work  Segal  and  Nelson  referred  to  above  that: 
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e  is  a  strongly  continuous  semigroup  on  L  (dp  ) , 

c 

which  is  contractive,  self-adjoint. 


is  positivity  preserving  and  improving  on  L  (dp  ); 

c 


(iii) 


'1  =  1. 


is  a  contraction  on  L^fdp^);  1  1  p  <  • 


is  hypercontractive : 


L  (dp  ) 
c 


CIMI 


L2(dp  ) 
c 


for  ^  e  l/(dpc)  and  t  sufficiently  large.  Moreover  the  stochastic  process  defined 
by  (3.20)  is  ergodic  and  has  as  its  unique  invariant  measure.  Therefore  the 
Ornste in-Uhlenbeck  process  defined  by  (3.20)  is  the  stochastic  quantization  of  the 
free  euclidean  field.  In  view  of  the  well  known  relationship  between  the  free  field 
and  the  Ising  model  via  the  lattice  approximation  (cf.  Guerra-Rosen-Simon  1975),  we 
see  that  simulating  the  stochastic  differential  equation  (3.20)  is  a  powerful  way  of 
obtaining  sample  functions  of  the  free  field  (Ising  field)  when  the  lattice  is 
large.  This  can  be  accomplished  in  a  parallel  machine  using  multi-grid  methods. 

As  remarked  by  Guerra-Simon-Rosen,  the  Ising  (nearest-neighbour)  nature  of  the 
lattice  fields  are  not  destroyed  if  we  use  non-gaussian  random  variables  at  the 
lattice  points.  This  can  be  accomplished  by  studying  the  stochastic  differential 
equation  on  V' 

d*(t)  =  -  j  C_8p( t)dt  +  XC1_e:p(t)3:dt  +  dw(t) 


(3.22) 


#<0>  -  #,  0  <  e  <  yj- 


where  :  denotes  Wick  ordering  with  respect  to  the  covariance  C  (cf.  Glimm-Jaffe 
1981).  Using  an  appropriate  Ito  rule  (proved  by  approximation  using  tame 
functions),  considering  an  aprpoximation  of  (3.22)  using  a  spectral  basis  related  to 
(-A+1)  on  L^(A),  and  using  limiting  arguments  coupled  with  hyper-contractive 
estimates  (see  the  discussion  in  the  first  part  of  this  section),  Mitter-Jona 
Lasinio  show  that  the  semi-group  defined  by 


(e'tLf)(#)  -  E(f(0(t))exp(?(t))  |#(t)  -  f 


(3.23) 


where 


t  2  t 

5(t)  »  -  y  J  (:#(s)3:,  dw( s) )  -  -y-  J  (:#(s)3:,  C1  e:0(s)3:)ds  (3.24) 


where  0(t)  is  the  Ornstei&-Uhleabeck  process  defined  by  (3.20)  satisfy 


(i)  e-t^  1=1  and  e-tL  is  a  contraction  on  L*(d|0. 

6 


W\ 


-tL  2 

(ii)  e  is  a  strongly  continuous,  self-adjoint  semigroup  on  L  (dp) 

X  f  4 

with  dii  =  exp(-  —  I  :p  (x):dx)dp  ,  positivity  preserving  and 

4  JA  ° 

improving  and  1  is  the  unique  ground  state. 

(iii)  e  is  ergodic  and  mixing. 


This  then  allows  them  to  show  that  the  stochastic  differential  equation  (3.22) 
hst  a  weak  solution  and  has  as  its  unique  invariant  measure  p. 

Since  the  stochastic  differential  equations  under  consideration  defines  a 
Markov  process  which  is  ergodic  (and  mixing),  we  have 

lia  E  ((?(t),  f.)(*(t) .f,)  ...  (p(t).f  )) 
ttco  12  n 


J(^.fj) . . .  (^,  fn)d(i 
Jdn 


where  the  f-  are  test  functions.  This  enables  us  to  compute  spatial  statistics  of 
the  time-in  independent  random  field  from  the  simulation  of  the  stochastic 
differential  equation. 

4 .  Some  Estimation  Problems  for  Random  Fields  on  a  Lattice  (cf.  Marroquin  198S) 


Let  VCZL  be  a  finite  subset.  Consider  a  random  field 


f:V 


(4.1) 


with  Gibbs  distribution  having  a  density  with  respect  to  dp  (see  (2.7))  given  by 


P£  =  Z_1exp(-  ~  Hq ( f ) ) , 


(4.2) 


- -  - .  1 
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and  Tq  >  0  is  a  parameter  (temperature). 

We  observe  a  corrupted  version  of  f  given  by 

g(j>  =  f[G(jtf),  y( j)],  j  a  SCV.  (4.3) 

where  H(j,.)  is  a  fnnction  with  'local*  support  and  i  is  invertible  in  the  sense 
that  y(j)  »  f_1(g(j),  G(jjf)).  We  shall  assume  that  jKi)  and  fK j)  are  indepenent 
and  also  .)  and  f(.)  are  independent.  Let  us  suppose  that  the  distribution  of 
y( .)  with  respect  to  dp  has  density 

Py  >  0  (4.4) 

Define  the  functions 

K( i ; f  ,g(  i) )  =  -In  p^(«"X(g(i).  G(  i  j  f ) ) .  (4.5) 

Then  the  conditional  density  p£/g  can  be  written  as 

Pf/g  =  Z'p  exp(-IMf.g)),  (4.6) 


Hp( f ,g)  *  tj—  ( f )  +  ^  K( i ; f , g ( i) )  (4.7) 

0  isS 

and  Zp  is  a  normalizing  constant. 

We  can  now  provide  a  physical  interpretation  of  the  posterior  distribution,  by 
considering  that,  while  the  prior  distribution  (4.2)  describes  the  behavior  of  a 
free  field  in  thermal  equilibrium,  the  distribution  (4.7)  describes  the  behavior  of 
the  same  field  coupled  with  a  fired  (but  spatially  varying)  external  field  whose 
valne  is  given  by  g.  The  functions  E  whose  magnitude  depends  on  the  noise  variance, 
can  then  be  interpreted  as  the  coupling  strengths  between  the  two  fields.  This 
coupled  system  is  also  Gibbsian  and  if 

G(i;f)  =  G( i ; f ( i ) ) 

the  Markovian  structure  of  this  field  will  be  identical  to  that  of  the  original 
field. 


The  importance  of  this  interpretation  lies  in  the  fact  that  the  optimal  estimate  of 
f  can  be  obtained  simply  by  observing  the  equilibrium  behavior  of  this  coupled 


In  particular  if  Hg(f)  is  the  Hamiltonian  of  a  ferromagnetic  Gaussian  Ising  field 
and  the  observation  is, 

g(i)  =  f ( i)  +  n(i) ,  (4.8) 

with  n(i)  Gaussian,  then  the  coupled  field  corresponds  to  a  Ising  field  coupled  to  a 
random  external  field. 


The  estimates  that  are  of  interest  to  us  depends  on  the  choice  of  the  problem.  The 
two  most  important  estimates  for  image  reconstruction  purposes  are: 

(i)  f  *  E( f  | g )  (Conditional  mean) 

(ii)  f  =  Arg  Max  p.i  (fjg)  (MAP) 

f  |g 

From  (4.6)  and  (4.7)  the  MAP  estimate  corresponds  to  minimizing  the  Hamiltonian 
Hp(f.g)  with  respect  to  f. 

4.1  Block  Spin  Transformation  for  MAP  Estimation  (Marroquin  1985) 

In  order  to  illustrate  the  analogy  with  Statistical  Physics  further  we  consider  the 
.MAP  estimation  of  a  binary  Ising  field  withe  the  observations  taken  as  the  output  of 
a  binary  symmetric  channel  with  error  rate  e«  Therefore  J  *  (1,-1)  and  the 

observation  model  is  given  by 


\l-s  if  g  =  f 
P(g(i)  |f(i))  =  | 


e  if  g .  £  f . . 
1  1 


Then  it  is  easy  to  see  that 


ep(f.g) 


^  f ( i) f ( j)  +  a  'S  f(i)g(i) 

0  ufj)  i 

Ili-ilH 


(4.8) 


and 


a  =  ln(— -) 
e 


Minimizing  Hp  is  now  a  combinatorial  optimization  problem. 


4.1.1  Simulated  Annealing  and  Global  Minimization 

Simulated  annealing  is  a  new  technique,  developed  by  Kirkpatrick  et  si  (j.983)  for 
the  solution  of  combinatorial  optimization  problems.  It  is  based  on  the  idea  that 
any  cost  functional  of  N  variables,  each  of  which  can  take  values  on  some  finite 


set,  can  be  considered  as  the  Hamiltonian  (Energy)  of  a  physical  system  whose  state 
corresponds  to  a  particular  value  of  these  variables.  Therefore,  we  can  use,  say, 
the  Metropolis  algorithm  to  generate,  at  any  given  'temperature*  T  (which  now 
becomes  a  parameter  of  the  optimization  process)  samples  from  the  corresponding 
Gibbs  measure.  As  T  1  0  this  measure  should  converge  to  a  measure  which 
concentrates  on  the  states  of  minimum  energy,  the  state  of  the  system  in  thermal 
equilibrium  at  zero  temperature  will  correspond  to  the  value  of  f  that  minimizes  the 
energy  H(f)  globally. 

One  serious  difficulty,  however,  is  that  attaining  thermal  equilibrium  might  take  a 
very  long  time  at  low  temperatures.  Kirkpatrick's  idea  was  to  start  at  a  relatively 
high  temperature  (where  thermal  equilibrium  is  reached  very  fast),  and  then,  to 
slowly  cool  the  system,  until  'freezing*  occurs  and  the  state  stops  changing. 

The  analysis  of  this  algorithm  is  presented  in  the  Appendix. 

4.1.2  Block  Spin  Transformations 

In  the  case  of  the  MAP  estimator,  the  efficiency  of  the  Simulated  Annealing 
algorithm  for  the  minimization  of  can  be  improved  by  defining  large  'blocks*  of 
sites  (in  a  manner  that  is  reminiscent  of  the  'block-spin*  strategy  used  by  Wilson 
(1975)  in  connection  with  the  renormalization  group  approach  to  the  study  of 
critical  phenomena);  the  optimal  estimate  for  the  average  value  of  the  field  in  each 
of  these  blocks  is  found,  and  then  progressively  refined  by  subdividing  the  blocks 
in  successive  annealing  stages.  We  will  now  show  that,  if  we  use  a  maximum  entropy 
assumption,  the  structure  of  the  MAP  estimation  process  for  Ising  models  is 
invariant  under  the  'blocking*  transformation;  this  means  that  the  ground  state 
(i.e.,  the  MAP  estimator)  of  the  aggregated  process  (with  blocks  of  size  L)  also 
corresponds  to  that  of  an  Ising  model  with  a  coupled  external  field,  in  which  the 
natural  temperature  is  scaled  by  a  factor  of  1/L,  and  the  noise  (coupling)  parameter 
by  a  factor  of  L  .  As  a  consequence  of  this  scaling,  the  final  temperature  for  the 
simulated  annealing  of  this  smaller  network  will  be  approximately  L  times  larger 
than  for  the  original  problem. 

If  we  denote  by  V(f(i),  f(j))  »  f(i)f(j)  and  q(f(i),  g(i))  *  f(i)g(i),  let 
Vc(f(i),  f(j))  and  qc(f(i),  g(i))  be  the  extension  to  RxR  of  V  and  q  respectively. 
We  then  write 

H  ( f  ,g)  ■  ■=—  5  V  (f(i).f(j))  +  o  5  q  (f(i).g(i))  (4.9) 
p  l-  L  c  La 

i,j*l  i 


We  will  now  derive  an  expression  for  the  energy  in  the  'block  spin*  case.  Let  us 


partition  the  original  lattice  into  square  blocks  of  side  L.  The  'block 
observations*  g^  will  now  be  the  density  of  l's  on  each  block,  i.e.. 


*L(i) 


1 


^  g(j)  . 
jsB. 


where  is  the  i**1  block.  The  'block  field*  f^  is  defined  in  a  similar  way. 


For  a  given  f^»  we  compote  the  energy  by  assuming  a  maximum  entropy  configuration, 
which  occurs  when  the  l's  that  correspond  to  the  given  density  f^(i)  are  randomly 
distributed  within  the  block.  The  energy  will  have  three  terms: 


1.  Interactions  between  adjacent  blocks: 

The  interaction  between  two  adjacent  blocks  i  and  j  will  be: 


I.  . 

ij 


[-1  .  (P 


11 


V 


+  1 


(P 


10 


p01» 


L 


where  Pkl  is  the  probability  of  having  an  element  with  state  k  on  block  i  adjacent 
to  an  element  with  state  1  on  block  j: 


P11  =  f1(i)fL(j> 

*01  ■  'L(j)(1  -  fL(i)) 

P10  *  fL<iHl  -  fL(J>> 

P00  "  <l  '  (L(,)>(1  '  EL<j>> 


Substituting  these  values  we  get: 

I. .  =  L[2(f.(i)  +  f.(j))  -  4f. ( i) f. ( j)  -  1] 
xj  L  1  L  L 

2.  Interactions  within  each  block:  the  internal  interaction  1^  is: 

I.  =  2L(L-lX-4f.  (i)2  +  4 f T  (  i )  -  1) 
l  L  L 


3.  Interaction  with  the  observations: 

Assuming  that  the  l's  in  the  observations  and  in  the  field  are  independently 
distributed  we  get: 


Iobs(i)  -  aL‘[fL(i)(l  -  *L(i))  ♦  (f  “  fL< i) >*L( i)3  - 
-  aL2[fL(i)  +  gL(i)  -  2fL(i)gL(i)] 


Finally,  the  Hamiltonian  takes  the  form 

vv  ■  T?  +  - 

°  i.j  i 

=  Lfcp-  J  [2(fL(i)  +  fL(j))  -  4f1(i)fL(j)  -  1]  + 
°  i.j 

+  y~  (L  "  1)  J  (~4fL(i)2  +  4fL(i)  -  1)  + 

^  i 

+  oL  J  (fL(i)  +  SL<i)  -  2fL( i) gL< i) } 


note  that  the  sums  are  taken  over  pairs  of  adjacent  blocks,  and 
respectively.  For  L  =  1 ,  this  expression  rednces  to  (4.9)  with 

V  <f(i),f(j))  =■  2(f(i)  +  f ( j) )  -  4f(i)f( j)  -  1 
c 

q  (f ( i) ,  g(i))  -  f(i)  +  g(i)  -  2f( i)g(i) . 
c 

For  L  >  1,  the  quadratic  terms  of  are: 

Y“  1-4  J  fL(i)fL(J)  -  8(L  -  1)  J  fL(i)2] 

°  i.j  i 


and  since 


-  2  J  fL<i)fL(j)  +  2  }  fLU)2  *  }  (fL(i)  '  fL(j))2  "  ° 
i.j  i  ij 


it  follows  that 


-4  J  fL(i)fL(j)  -  8(L  -  1)  J  fL<i)2  < 
i.j  i 

<  -(4  +  8(L  -  1) )  J  fL( i)2  <  0 
i 

which  implies  that  is  negative  definite  for  L  >  1,  and  therefore,  its  minima, 
constrained  to  the  hypercnbe  [0,1]^  (N^  is  the  total  nnmber  of  blocks)  will  always 

lie  in  a  corner  of  such  hypercnbe  which  means  that  we  can  use  simulated  annealing  to 
find  the  global  minimum  of  H^,  constraining  the  search  to  {0,1}  .  In  this  case, 

the  energy  to  be  minimized  takes  the  simpler  equivalent  form  (up  to  an  additive 
constant) : 

°L-t7l  }  V(£L(i).fL(j)>  ♦  oL2  (f  (».,  (!)) 

°  i.j  i 

The  minimum  energy  solutions  for  each  L  can  be  interpreted  as  'coarse  scale* 
representations  of  the  original  pattern  f.  Once  a  solution  is  obtained,  the  next 
refinement  (for  blocks  of  size  L/2)  can  be  efficiently  obtained  using  the  previous 
solution  as  a  starting  point,  and  initiating  the  annealing  process  at  a  lower 
temperature . 

5.  FINAL  REMARKS 

Due  to  lack  of  space  we  are  unable  to  discuss: 

2 

(i)  Estimation  of  boundaries  using  coupled  models  on  the  lattice  Z  and  the 

2 

dual  lattice  of  bonds  on  Z  , 

(ii)  Estimation  of  the  field  and  temperature  parameter  T  using  the  innovations 
field 

(iii)  other  problems  in  computation  vision  such  as  depth  from  stero- images,  shape 
from  shading  etc. 

A  preliminary  account  can  be  found  in  Marroquin  (1985). 

Appendix  on  Simulated  Annealing. 

Let  0  be  a  finite  set  and  let  ]0 |  denote  the  cardinality  of  Q.  Consider  the  problem 
of  minimizing  the  energy  function: 

0:0  ->R:  i  ->U.. 

1 

Let  Nq  *  NU(0)  where  N  are  the  natural  numbers,  >  0,  k  s  Nq  be  a  sequence  of 

real  numbers.  Consider  a  Markov  chain  {x^}kg^  with  1-step  transition  matrices 
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(p( fc.k+1 ! and  some  initial  distribution  constructed  on  a  probability  space  and 
let  *  P{ij.“i] ,  itQ,  IsNq.  The  'annealing  chain*  is  simulated  as  follows: 

Suppose  Xj,®i.  Then  generate  a  random  variable  y  with  P(y»j)  *  q^j  where 
Q  ®  a  stochastic  matrix.  Suppose  y*j,  and  then  define 

j  if  D.  <  D. 

J  1  (U.-U.)/T 

=  j  if  IL  >  with  probability  e  1  J 

i  otherwise 

Ve  may  think  of  the  annealing  algorithm  as  a  probabilistic  descent  algorithm  where 
the  Q-matrix  represents  some  prior  distribution  of  'directions',  transitions  to  some 
or  lower  energies  are  always  allowed  and  transitions  to  higher  energies  are  allowed 
with  positive  probability  which  tends  to  0  as  k  -4®. 

Hajek  1985  has  given  necessary  and  sufficient  conditions  on  the  rate  at  which 
should  go  to  zero  such  that  P{x^  s  S*}  -4  1  as  k  -4®  where  S*  is  the  set  of  global 
minimizing  energy  states.  In  this  analysis  the  stochastic  matrix  Q  has  to  be 
irreducible  and  satisfy  a  weak  reversibility  condition. 

In  Gelf and-Mitter  1985  a  finite-time  analysis  of  the  annealing  chain  has  been 
performed  as  well  as  a  result  on  the  rate  of  convergence  of  Pfxj.  e  S*}  -4  1  as  k  -9® 
has  been  given,  under  somewhat  weaker  hypotheses  on  Q. 

Finally,  in  Tsitsiklis  1985,  necessary  and  sufficient  conditions  for  P{x^  e  S*}  -41 
as  k  -4  ®  are  given  by  considering  the  annealing  chain  as  a  singularly  perturbed 
Markov  chain  operating  under  different  time-scales  (under  hypotheses  weaker  than 
that  of  Hajek) . 

Space  does  not  permit  us  to  give  a  detailed  account  of  these  results. 
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